Exploring Zero-Shot Emotion Recognition in Speech Using Semantic-Embedding Prototypes

نویسندگان

چکیده

Speech Emotion Recognition (SER) makes it possible for machines to perceive affective information. Our previous research differed from conventional SER endeavours in that focused on recognising unseen emotions speech autonomously through machine learning. Such a step would enable the automatic leaning of unknown emerging emotional states. This type learning framework, however, still relied manual annotations obtain multiple samples each emotion. In order reduce this additional workload, herein, we propose zero-shot framework employing per-emotion semantic-embedding paradigm describe SER, instead using sample-wise descriptors. Aiming optimise relationship between emotions, prototypes, and samples, includes two types strategies: Sample-wise emotion-wise These strategies apply novel process respectively, via specifically designed prototypes. We verify utility these approaches by performing an extensive experimental evaluation corpora three aspects, namely influence different strategies, emotional-pair comparison, selections prototypes paralinguistic features. The results indicate is applicable use emotion recognition speech, despite choosing optimal

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Network

We propose a novel framework called SemanticsPreserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training. SP-AEN aims to tackle the inherent problem — semantic loss — in the prevailing family of embedding-based ZSL, where some semantics would be discarded during training if they are nondiscriminati...

متن کامل

Gaussian Visual-Linguistic Embedding for Zero-Shot Recognition

An exciting outcome of research at the intersection of language and vision is that of zeroshot learning (ZSL). ZSL promises to scale visual recognition by borrowing distributed semantic models learned from linguistic corpora and turning them into visual recognition models. However the popular word-vector DSM embeddings are relatively impoverished in their expressivity as they model each word as...

متن کامل

Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation

Most existing zero-shot learning approaches exploit transfer learning via an intermediate-level semantic representation such as visual attributes or semantic word vectors. Such a semantic representation is shared between an annotated auxiliary dataset and a target dataset with no annotation. A projection from a low-level feature space to the semantic space is learned from the auxiliary dataset ...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition

Automatically recognizing a large number of action categories from videos is of significant importance for video understanding. Most existing works focused on the design of more discriminative feature representation, and have achieved promising results when the positive samples are enough. However, very limited efforts were spent on recognizing a novel action without any positive exemplars, whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Multimedia

سال: 2022

ISSN: ['1520-9210', '1941-0077']

DOI: https://doi.org/10.1109/tmm.2021.3087098